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rob^ems  of  structural  isomerism  in  chemistry  have  received  much  attention. 
But  only  occasional  inroads  have  been  made  toward  a  systematic  solution  of 
the  underlying  graph  theoretical  problems  of  structural  isomerism.  Solutions 
in  the  past  have  been  partial,  with  acyclic  and  cyclic  structures  being 
treated  independently  Recently  the  "boundaries,  scope  and  limits"^  of 
the  subject  of  structural  isomerism  of  acyclic  molecules  have  been  defined 
by  the  DENDRAL  algorithm'  .  This  algorithm  permits  an  enumeration  and 
representation  of  all  possible  acyclic  molecular  structures  with  a  given 
empirical  formula. 

Acyclic  molecules  represent  only  a  subset  of  molecular  structures,  however, 
and  it  may  be  ai  yued  that  cyclic  structures  (including  those  possessing 
acyclic  chains)  are  of  more  general  interest  and  importance  to  modern 
chemistry  from  both  a  practical  and  theoretical  standpoint.  An  approach  to 
cyclic  structure  generation  has  appeared  in  a  previous  paper  in  this  series^  . 
ihat  approach,  which  operates  on  a  set  of  previously  generated  acyclic  forms 
by  labelling  hydrogen  atoms  pairwise  and  connecting  the  atoms  to  which  they 
are  attached  with  a  new  bond,  has  one  serious  drawback.  The  approach  cannot 
make  efficient  use  of  the  symmetry  properties  of  cyclic  graphs;  hence  an 
inordinate  amount  of  computer  time  must  be 


(3)  J.  Lederberg,  G.L.  Sutherland,  B.G.  Buchanan,  E.A.  Feigenbaum, 

A.V.  Robertson,  A.M.  Duffield,  and  C.  Djerassi,  J.  Amer.  Chem.  Soc .  ,  91 
2973  (1969).  ~  -  -  - 

(J4)  Y.M.  Sheikh,  A.  Buchs,  A.B.  Delfino,  G.  Schroll,  A.M.  Duffield, 

.  Djerassi ,  B.G.  Buchanan,  G.L.  Sutherland,  E.A.  Feigenbaum,  and 
J.  Lederberg,  Org.  Mass  Spectrom.  ,  h ,  1+93  (1970). 
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spent  in  retrospective  checking  of  each  candidate  structure  uith 


existing  structures  to  remove  duplicates.  For  this  reason,  an 
alternative  approach  to  construction  of  cyclic  molecules  has  been 
developed.  This  approach  is  designed  to  take  advantage  of  the 
underlying  graph  theore t i c  cons i derat i ons,  primarily  symmetry,  to 
arrive  at  a  method  for  more  efficient  construction  of  a  complete  and 
irredundant  list  of  isomer6  for  a  given  empirical  formula.  Central 
to  the  successful  solution  of  this  problem  is  the  generation  of  all 
positional  isomers  obtained  by  substitutions  on  a  given  ring  system. 
This  topic  has  rece.ved  attention  for  nearly  100  years,  with  limited 

5 

success  .  Its  more  general  ramifications  go  far  beyond  organic 
chemistry.  Graph  theoreticians  have  considered  various  aspects  of 
this  topic,  frequently,  but  not  necessarily,  in  the  context  of 
organic  molecules.  Polya  ha9  presented  a  theorem  u  which  permits 
calculation  of  the  number  of  structural  isomers  for  a  given  ring 
system.  Hill  has  applied  this  theorem  to  enumeration  of 
isomers  of  9imple  ring  compounds  and  Hill  and  Taylor0  have 


(5)  See,  for  example,  A.C.  Lunn  and  J.K.  Senior,  J.  Phys.  Chem. . 
33,  1027  (1323)  and  references  cited  therein. 

(6)  a)  G.  Polya,  QwiJ.  rend.,  201,  1167  (1335); 

b)  G.  Polya,  Helv.  Chim.  Acta.  13,  22  (1336); 

c)  G.  Polya.  Z.  trust.  92.  415  0936) ; 

d)  G.  Polya,  Acta  Hath.TzS.  145  (1337). 

(7)  a)  T.L.  Hill,  J.  Phys.  Chem. .  47,  253  (1943); 
oi  T.L.  Hill,  ibid. .  p,  413. 

c)  T.L.  Hill,  J.  Chem.  Phys . .  11,  294  (1943). 

- tAA 

IS)  U.J.  Taylor  J.  Chem.  Phys. ,  11,  532  (1943). 
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pointed  out  that  Polya's  theorem  permits  enumeration  of  geometrical  and 

optical  isomers  in  addition  to  structural  isomers.  More  recently,  formulae 

for  enumeration  of  isomers  of  monocyclic  aromatic  compounds  based  on  graph 

9a 

theory,  permutation  groups  and  Polya's  theorem  have  been  presented  .  This 

history  of  interest  and  results  provides  only  marginal  benefit  to  the  organic 

5-9e 

chemist.  Although  the  number  of  isomers  may  be  interesting,  these  methods 
do  not  display  the  structure  of  each  isomer.  Also,  these  methods  do  not 
provide  information  on  the  more  general  case  where  the  ring  system  is 
embedded  in  a  more  complex  structure.  Even  for  simple  cases  the  task  of 
specifying  each  structure  by  hand,  without  duplication,  is  an  onerous  one. 

Balaban  has  published  a  series  of  papers9  addressed,  in  part,  to  the  problem 

of  specification  of  isomeric  structures.  Although  his  method,  which  differs 

substantially  from  our  own,  involves  significant  manual  effort  and  does  not 

arpear  to  encompass  a  mechanism  for  prospective  avoidance  of  duplicate 

9b  9c 

structures,  his  compilations  of  isomers  of  annulenes  ’  ,  represent  an 

important  contribution  as  extensions  to  the  compilations  of  Lederberg 

METHOD 


OVERVIEW 

Framework.  The  framework  for  this  method  is  that  chemical  structures  consist 

,  •  •  *  10,11 

of  some  combination  of  acyclic  chains  and  rings  or  ring  systems  .  lhe 

problem  of  construction  of  acyclic  isomers 


9a)  A.T .  Balaban  and  F.  Harary,  Rev.  Rom.  Chim.  ,  12,  1511  (1967);  h)  ibid.  , 

11,  1097  (1966)-,  Erratum,  ibid.  ,  12,  No.  1,  103  (l9o7);  c)  ibid. ,  1J ,  865 
(1972);  d)  ibid. ,  18,  635  (1973) ,  and  additional  references  cited  therein. 

10)  J.  Lederberg,  DENDRAL-6U ,  Part  I.  Rotational  Algorithm  for  Tree  Structures, 

NASA  Star  No.  N65-13158,  NASA  CR-57029;  Part  II.  Topology  of  Cyclic  Graphs,  NASA 
Star  No.  N66-IU07U,  NASA  CR-68898;  Part  III.  Complete  Chemical  Graphs:  Embedding 
Rings  in  Trees,  NASA  Star  No.  N71_7606l,  NASA  CR-123176. 

11)  It  is  assured  that  structures  are  completely  connected  by  chemical  bonds; 

thus  catenates  and  threaded  structures  are  viewed  as  consisting  of  separate  molecules. 
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.  If  all  possible  ring 


(and  radicals)  has  been  solved  previously 
systems  can  be  constructed  from  all  or  part  of  the  atoms  in  the 
empirical  formula,  and  all  possible  acyclic  part6  are  available  from 
the  acyclic  generator,  the  combination  of  ring  systems  with  acyclic 
parts  in  all  unique  ways  would  yield  the  complete  list  of  isomers. 
The  method  for  construction  of  ring  systems  is  described  below.  This 
description  employs  some  terms  which  require  definition.  The 
definitions  also  serve  to  illustrate  the  taxonomic  principles  which 
underlie  the  operation  of  the  structure  generator.  The 

generator's  view  of  molecular  structure  differs  in  some  respects  from 
the  chemist’s.  A  chemist,  for  example,  may  view  structures 
possessing  the  same  functional  group  or  ring  as  related.  The 

10 

generator  works  at  the  more  undamental  level  of  the  vertex-graph  ^ 
as  described  below. 

Chemical  Graph.  A  molecular  structure  may  be  viewed  as  a  graph, 
termed  the  chemical  graph,  or  skeleton.  A  chemical  graph  consists 
of  nodes,  with  associated  atom  names,  and  edges,  uhich  correspond 
to  chemical  bonds.  Consider  as  an  example  the  substituted  piperazine, 

1,  whose  chemical  graph  is  illustrated  in  Chart  I  as  Note 

that  hydrogen  atoms  are  ignored  by  convention,  while  the  symbol  "U” 
is  used  to  specify  the  unsaturation.  The  degree  (primary,  secondary, 
...)  of  a  node  in  the  chemical  graph  has  its  usual  meaning,  i.e.,  the 
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number  of  (non-hydrogen)  edges  connected  to  it.  The  valence  of  each 
atom  determines  its  maximum  degree  in  the  graph.  As  usally  displayed 
ny  chemists  in  planar  representation,  the  chemical  graph  describes 
the  connectivity  rather  than  the  geometric  configuration  of  a 
molecular  structure. 

Superatom,  in  general,  a  chemica1  graph  can  be  separated  into 
cyclic  and  acyclic  parts.  Each  cyclic  structural  sub-unit  may  be 
deemed  a  suparatom  possessing  any  number  of  free  va Ience6^ 

The  chemical  graph  2  arises  from  a  combination  of  two  carbon  atoms 
with  r  ing-superatom  3.  Ring-9uperatom  3  possesses  the  indicated 
free  valences  to  which  the  remaining  hydrogen  and  two  methyl  radicals 
will  be  attached  (Chart  I). 

A  cih'*t*d  eko/eton  is  a  skeleton  with  free 
valences  but  without  atom  names.  Ring-superatom  3  arises  from  the 

ciliated  skeleton  4  by  associating  the  atom  names  of  eight  carbon  and  two 

nitrogen  atoms  with  the  skeleton  (Chart  I). 


Cycljc^Skeletoin.  A  chemical  graph  whose  nodes  are  not  associated 
with  atom  names  and  which  contains  no  acyclic  parts  and  no  free 


12)  A  free  valence  is  a  bond  with  an  unspecified  terminus.  Any  substructure, 
cyclic  or  not,  may  be  treated  as  a  superatom;  however,  the  term,  in  this 
paper,  is  generally  restricted  to  cyclic  (termed  ring-)  superatoms. 
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valences  is  termed  a  cyclic  skeleton.  Ciliated  skeleton  4  arises 
from  one  way  of  associating  eixteen  free  valences  with  the  nodes  on 
the  cyclic  skeleton  (Chart  I). 

I O 

Vertex^Graph.  Vertex-graphs  are  cycl ic  ske letons  from  which 

nodes  of  degree  less  than  three  have  been  deleted.  The  vertex-graph 
of  the  cyclic  skeleton JS  is  the  regular  tri valent  graph*0  of  two 
nodes,  G.  Note  that  the  remaining  nodes  of  the  cyclic  skeleton  S 
are  of  degree  two.  Removal  of  these  secondary  nodes  from  5  while 
retaining  the  i nterconnect i one  of  the  two  tertiary  nodee  yields  6 

°  **"V 

(Chart  I). 

As  an  illustration  of  the  variety  of  structures  which  maybe 

constructed  from  a  given  vertex-graph  and  empirical  formula,  for 

example,  C  H  N  ,  consider  that  graph  6  is  the  vertex-graph  for 
10  23  2  , 

all  bicyclic  ring  systems  (excluding  spiro  forms).  Cyclic  skeletons 

7  and  8.  (Chart  i),  for  example,  may  be  constructed  from  eight 

secondary  nodes  and  Js.  There  are  many  ways  of  associating  sixteen 

free  valences  with  each  cyclic  skeleton,  resulting  in  a  larger  number 

of  ciliated  skeletons.  For  example,  3  and  18  ariee  from 

different  allocations  of  sixteen  free  valences  to  5  (Ciart  I). 

There  is  only  one  way  to  associate  eight  carbon  atoms  and  two 

nitrogen  atoms  with  each  ciliated  skeleton  to  yield  superatoms  (e.g. 


6 


Chart  I 


» 


r 


r 


r 


f 


t 


f 


f 


r 


Conventional  Representation! 
Composition  =  CI0H20N2 


Chemical  Graph 
Composition  =  C^f^ 

Superatoms 

Ring- supera  tom  Composition*  CNU 

8  2  2 

Acyciic  Superatam  Composition  *  C2 
Ciliated  Skeleton: 


Cyclic  Skeleton: 


Vertex  Graph 
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JJ-  ani  Chart  l).  However,  several  structures  are  obtained  by- 
associating  the  remaining  two  carbon  atoms  (in  this  example)  with  each 

superatom,  as  an  ethyl  or  two  methyl  groups.  Chemical  graphs  13  and  ll , 

*•*  •*** 

for  example,  arise  from  two  alternative  ways  of  associating  two  methyl 
groups  with  suoeratom  3. 

t 

Multiple  Bonds .  For  the  purposes  of  this  program  we  adopt  the  formalism 
that  all  multiple  bonds  (double,  triple,  ...)  are  considered  to  be  small 
f  rings  by  the  program.  Previous  versions J  (acyclic  generator)  differ  from 

this  program  in  that  double  and  triple  bonds  are  regarded  as  specially 
labelled  edges. 

*  AIMS 

The  structure  generator  must  produce  a  complete  list  of  structures  without 
duplication.  By  duplicate  structures  we  mean  structures  which  are 

t'  . 

equivalent  in  come  well-defined  sense.  The  class  of  isomers  generated  by 

the  program  includes  only  connectivity  isomers.  Transformations  (utilized 

to  determine  eauivalence)  allowed  under  connectivity  symmetry  preserve  the 

valence  and  bond  distribution  of  every  atom.  Connectivity  symmetry  does 

not  consider  bond  lengths  or  bond  angles.  This  choice  of  symmetry  results 

in  construction  of  a  set  of  topologically  unique  isomers.  A  more  detailed 

discussion  of  equivalence  is  discussed  in  Appendix  A  and  in  the  accompanying 
13 

paper  ;  a  discussion  of  isomerism  and  symmetry  is  presented  in  Appendix  B. 
13)  L.  Masinter,  N.S.  Sridharan,  J.  Amer .  Chem.  Soc.,  00,  0000  (1973). 

» 
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STRATEGY 


The  strategy  behind  the  cyclic  structure  generator  is  strongly  tied  to  the 
framework  described  above.  The  strategy  is  summarized  in  greatly  simplified 
form  in  Figure  1 .  The  vertex-graphs  from  which  structures  are  constructed  can 
be  specified  for  a  given  problem  by  a  series  of  calculations.  Thus  Part  A  of 
the  program  figure  1)  partitions  the  pot  of  atoms  in  all  possible  ways;  each 
partition  consists  of  those  atoms  assigned  to  one  or  more  "superatompots"  and 
a  "remaining  pot.  "  Each  superatompot  is  a  collection  of  atoms  from  which  all 
possible,  unique  ring-superatoms  can  be  constructed  based  on  the 
appropriate  vertex-graphs  (Fart  B,  Fig.  1).  Each  ring-superatom  will  be  a  ring 
system  in  completed  structures.  The  atoms  in  the  remaining  pot  will  form 
acyclic  parts  of  the  final  structures  when  combined  in  all  possible,  unique  ways 
with  the  ring-superatoms  from  the  corresponding  initial  partition  (Part  C,  Fig.  1). 

DESCRIPTION 

Le  are  faced  with  the  difficulty  of  describing  a  complex  computer 
program  in  the  traditional  mode  of  presentation  in  a  scientific 
journal.  The  narrative  form  is  not  the  ideal  medium  for  this 
description;  simple  examples  do  not  always  indicate  all  essential 
aspects  of  a  program.  A  deeper  understanding  of  a  program  could  be 
eng  jndered  through  the  use  of  a  large  number  of  we  1 1  chosen  examples, 
but  the  length  of  such  a  presentation  would  be  excessive  and  would 
tax  the  patience  of  even  the  most  interested  reader. 
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We  are  thus  aware  of  the  insufficiency  of  considering  only  one  example  in  the 

following  written  description.  We  have  adopted  the  strategy  of  presenting 

essential  aspects  of  the  procedure  for  structure  generation  in  the  main 

body  of  the  text.  Details  of  the  description  which  might  obscure  Hie 

principal  concepts  are  placed  in  Appendices  C  and  D.  Mathematical 

14  15 

details  are  available  elsewhere.  ,  We  hope  this  serves  the  purpose  of 
providing  the  casual  reader  with  a  deeper  understanding  of  the  method 
withcuf  having  to  contend  with  details  which,  on  the  other  hand,  are 
important  to  others  who  wish  to  make  use  of  our  approach. 


The  example  chosen  to  illustrate  each  step  of  the  method  is  C,H0  (or  C.IL  as 

6  8  6  3 

there  are  three  degrees  of  unsaturation). 


This  example  does  not  contain  bivalent  or  trivalent  atoms  (e.g.,  oxygen  and 
nitrogen,  rt,soectively)  or  atoms  of  valence  greater  than  four,  nor  any 
univalent  atoms  other  than  hydrogen  (e.g.,  chlorine,  fluorine). 


Partitioning  and  Labelling.  The  mechanism  for  structure  generation 
involves  a  series  of  "partitioning"  steps  followed  by  a  series  of 


(l4)£)  H.  Brown,  L.  Masinter  and  L.  Hjelmelend,  Discrete  Mathematics,  in 
press; 

(b)  Stanford  Computer  Science  Memo  STAN-CS-72-0318. 

f  (J5)  (a)  H.  Brown  and  L.  Masinter,  Discrete  Mathematics,  submitted; 

(b)  Stanford  Computer  Science  Memo  STAN-CS-73-0361 . 
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f  "labelling”  steps.  Partitions  are  made  of  items  which  must  be 

assigned  to  objects  (usually  graph  structures  or  part6  thereof)  as 
the  molecular  structures  are  built  up  from  the  vertex-graphs.  The 
,  process  by  which  items  are  assigned  to  the  graphs  is  termed  labelling?*^ 

Examination  of  Chart  I  reveals  the  different  types  of  items 
involved.  For  example,  nodes  are  partitioned  among  and  labelled  upon 
the  edges  of  the  vertex-graphs  to  yield  the  cyclic  skeletons.  Free 
valences  are  partitioned  among  and  labelled  upon  the  nodes  of  cyclic 
skeletons  to  yield  ciliated  skeletons,  and  so  forth. 

Partitioning  steps  in  the  subsequent  discussion  are  carried  out 

assuming  that  objects  among  which  items  are  partitioned  are  inoist- 

ingui shable.  Di st ingui shabi I i ty  of  objects  (edges,  nodes,  ...)  is 

specified  during  labelling  and  will  be  discussed  in  a  subsequent 

section.  The  partitioning  steps  performed  by  the  program,  are 

• 

outlined  in  Table  I.  Each  step  is  described  in  mbre  detail  be  lew. 


I 
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Table  1 . 

Partitioning  Steps  Performed  by  the  Structure  Generator 

Step  # 

Partition 

Among 

1 

Atoms  and  Unsaturations 
in  Empirical  Formula 

Superatompots  and 

Remaining  Pot 

2 

Free  Valence 

Atoms  in  Superatompot 

3 

Secondary  Nodes 

Loops  /  Non-loops 

4 

Non-loop  Secondary 

Nodes 

Edges  of  Graph 

5 

Loop  Secondary  Nodes 

Loops 

6 

Ring-superatoms  and 
Remcining  Pot 

Efferent  Links 
(see  Appendix  D) 

G&BLA- 

Rmg-superatoms  are  two-connected"  structures,  i.e.,  the  ring- 

superatom  cannot  be  split  into  two  parts  by  scission  of  a  single 

bond.  The  atoms  in  an  empirical  formula  may  be  distributed  among 

• 

from  one  to  several  such  two-connected  r  ing-superatoms.  A 

distribution  which  allots  atoms  to  two  or  more  superatompots  will 

yield  (respectively)  structures  containing  two  or  more  ring- 

lb 

superatoms  linked  together  by  single  bonds  (or  acyclic  chains) 


lb)  Chemists  are  more  familiar  with  terms  such  as  rings  or  ring 
systems.  The  term  two-connected  is  used  here  in  conjunction  with 
r i ng-superatoms  for  a  more  precise  description.  For  example, 
biphenyl  may  be  viewed  as  a  single  ring  system  or  two  rings  depending 
on  the  chemical  context.  In  this  work,  however,  biphenyl  consists  of 
two  ring-superatoms  (two  phenyl  rings)  linked  by  a  single  bond. 
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r  In  the  generation  process,  one  must  find  all  possible  wa/s  of  partitioning  the 

given  formula  into  superatompots  and  a  remaining  pot,  such  that  molecules  can 
be  constructed.  The  considerations  in  fr-ming  superatom  partitions  deal 

* 

primarily  with  valence  and  unsaturation.  This  procedure  is  summarized  in 
Appendix  C,  Superatom  Partitions.  The  partitions  which  result  are  summarized 
in  Table  II. 


Table  II. 

Allowed  Partitions  of 

C  U  Into  Superatompots  and  Rema 

8  3 

Pot. 

Par  t  i  t ion 

Number  of 

Superatompot 

Number 

Remaining 

Number 

Superatompots 

1 

2 

3 

Pot 

1 

1 

C  U 

_ 

. 

6  3 

2 

1 

C  U 

- 

- 

C 

S  3 

1 

3 

1 

c  u 

- 

- 

c 

A  3 

2 

A 

1 

c  u 

- 

- 

C 

3  3 

3 

5 

2 

C  U 

.  c  U 

_ 

A  2 

2  1 

• 

8 

2 

C  U 

C  U 

• 

C 

3  2 

2  1 

1 

7 

2 

c  u 

C  U 

- 

C 

2  2 

2  1 

2 

8 

2 

c  u 

C  U 

A  1 

2  2 

9 

2 

C  U 

C  U 

• 

C 

3  1 

2  2 

1 

16 

2 

C  U 

c  u 

_ 

3  2 

3  1 

11 

3 

C  U 

C  U 

C  U 

2  1 

2  1 

2  1 
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PARTJL  Rj ruj-iiupcraton  Construct i on. 

Each  partition  (Table  II)  must  now  be  treated  in  turn.  The  complete 
se  of  r  ing-superatoms  for  each  6uperatompot  in  a  given  partition 
mus  be  constructed.  The  major  steps  in  the  procedure  are  outlined 
in  Figure  2. 

Va|ence  List.  The  first  step  in  part  B  is  to  strip  the  superatompot  of 
atom  names,  while  retaining  the  valence  of  each  atom.  The  numbers  of  each 
type  of  atrm  are  saved  for  later  laLsIling  of  the  ciliated  skeletons  (Chart  I). 
A  valence  list  may  then  be  specified,  giving  in  order  the  number  of  bi-,  tri— , 
tetra-  and  n-valent  nodes  which  will  be  incorporated  in  the  superatom.  Thus 
the  superatompot  C^U^ '$  transformed  into  the  valence  list  0  bivalents,  0 
trivalents,  6  tetravalents  (0,  0,  6),  and  C^  becomes  (0,  0,  4)  (Figure  2). 

Cqlcu!qti°n  of  Free  Valence.  From  the  valence  list  and  the  associated 
unsaturation  count  the  number  of  free  valences  of  each  superatompot  is 
determined  uniquely,  (see  Calculation  of  Free  Valence,  Appendix  C).  For 
C^U^  the  free  valence  is  eight  (Fig.  2).  The  free  valence  of  a  superatom 
represents  the  number  of  bonding  sites  which  can  connect  to  hydrogen 
atoms,  other  superatoms  or  atoms  in  the  remaining  pet. 

Parti t ion i ng  of  Free  Va I e n c e .  The  free  valences  are  then  partitioned 
among  the  nodes  in  the  valence  list  in  all  possible,  unique  ways,  (see 
Appendix  C,  Partitioning  of  Free  Valence). 
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Ot-cjrce  List.  Each  partition  of  free  valences  alters  the  effective 

valence  of  the  nodes  in  the  original  valence  list  with  respect  to  the 

ring-superatom.  In  the  example,  assignment  of  one  or  two  free 

vaiences  to  a  tetravalent  node  transforms  this  node  into  a  tri-  or 

bivalent  node  respectively.  As  the  r ing-superatom  is  constructed, 

those  tetravalent  nodes  uhich  have  been  assigned,  say,  two  free 

valences,  have  then  only  two  valences  remaining  for  attachment  to  the 
_  17 

ring-superatom.  These  nodes  are  then  of  degree  two  and  may  be 
termed  secondary  nodes,  Thus  the  partition  of  free  valences 
2, 2, 2, 2,0,0  on  six  tetravalent  nodes  yields  the  degree  l  ist  (4,0,2) 
(Fig.  2)  as  four  of  the  tetravalent  nodes  receive  two  free  valences 
eachj  y i e I d i ng  four  nodes  of  degree  two  (secondary)  and  leaving  two 
nodes  of  degree  four  (quaternary).  The  program  keep6  track  of  the 
number  of  free  valences  assigned  to  all  nodes  for  use  in  a  subsequent 
step. 


Loops.  As  will  be  clarified  in  the  subsequent  discussion,  there  are 
several  general  types  of  ring-superatoms  which  cannot  be  constructed 
from  the  vertex-graphs  available  in  the  CATALOG  (described  below). 


17)  Use  of  the  term  degree  with  reference  to  the  degree  list  refers  to  the 
number  of  bonds  other  than  free  valences,  with  double  bonds  being  counted 
twice.  A  free  valence  may  or  may  not  eventually  be  attached  to  a  hydrogen 
atom  in  the  final  structure. 
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These  are  all  C3$es  of  multiple  extended  unsaturations  either  in  the 
form  of  double  bonds  or  rings.  Examples  are  the  following: 

1)  bi-,  t r i - .  ...  n-cyclics  with  exocyclic  double  bonds; 

2)  some  types  of  spiro  ring  systems; 

3)  allenes  extended  by  additional  double  bonds,  e.g. , 

c-c-c-c 


The  concept  of  o  loop,  each  loop  consisting  of  a  single  unsaturation  and  at  least 
one  bivalent  node,  must  be  utilized  for  these  cases.  Examples  of  loops 
containing  one,  two  and  three  bivalent  nodes  are  shown  in  Chart  II.  Note  that 
the  two  remaining  "ends"  of  the  unsaturation  will  yield  a  "looped  structure" 
when  attached  to  a  single  node  in  a  graph  (shown  as  Chart  II). 

Chart  11 

bivalents-  1  2  3 

GOO 


The  method  for  specification  of  loops  is  discussed  in  Calculation  of 
Loops,  Apoendix  C. 

Partitioning  of  Secondary  Nodes  among  Loops  and  Non-loops.  The  secondary 

tArvV  « - -  v  ■  *  t/V<V  WWWVVvW  • 

nodes  in  the  degree  list  are  partitioned  between  the  loops  (if  any)  calculated 
in  the  previous  step  and  the  remaining  non-loop  portion  of  the  eventual  graph. 


Aspects  of  this  partitioning  step  are  presented  in  Partitioning  cf  Secondary  Nodes 

Among  Loops  and  Non-Loops,  Appendix  C.  Results  for  the  example  ore 
indicated  in  Figure  2. 

Thi9  procedure  yields  the  reduced  degree  list 
which  contains  none  of  the  secondary  nodes  originally  present  in  the 
aegree  list.  Any  secondary  nodes  appearing  in  the  reduced  degree  list 
are  termed  "special"  secondary  nodes  as  these  nodes  will  have  loops 
attached  in  subsequent  steps. 

The  reduced  degree  lists  are  used  to  specify  a  set 
of  ver ten-graphs  for  the  eventual  r ing-superatoms.  All  two-conrected 
structures  can  be  described  by  their  ver tex-graph6,  uhich  are,  for 
most  structures,  regular  trivalent  graphs.  This  concept  has  been 
described  in  detail  by  Lederberg  ^  ,  who  has  also  presented  a 

generation  and  classification  scheme  for  such  graphs.  Given  a  set  of 
a  i  vertex-graphs,  the  set  of  all  ring-superatoms  may  be  specified*^ 
ihe  vertex-graphs  are  maintained  by  the  program  in  the  CATALOG, 
latalog  entries  f or  regular  trivalent  graphs  possessing  two  and  four 
fiocs  ore  presented  :n  Table  III.  This  list  must  be  supplemented  by 
additional  ver tex-gr aphs  to  cover  several  special  cases  required  for 
generation  of  all  structures  for  the  example.  These  are  also 
presented  in  Table  III.  With  the  reduced  degree  list  of  a 
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TABLE  III.  Vertex-Graphs  Necessary  for  Construction  of  Isomers 
of  C^Hg.  This  is  a  Partial  Listing  of  the  Catalog.0 


Planar 

Representation 

Name^ 

Number  of  Nodes 
of  Degree 

Three  Four  Remarks 

j 

(D 

2A 

(hosahedron) 

2 

0 

Regular  trivalent  graph 
of  two  nodes 

□ 

I1 

4AA 

4 

0  > 

| 

1 

i 

• - 4 

K 

► 

4BB 

(tetrahedron) 

4 

l 

0  J 

Regular  trivalent  graphs 
f  of  four  nodes 

1 

"Singlering  k" 

0 

0 

A  single  ring  composed 
of  k  secondary  nodes 

( 

& 

) 

Tetravalent 

Dihedron 

0 

2 

Two  nodes  of  degree 
four 

oo 

"Daisy" 

0 

1 

A  single  quaternary 
node 

1 

2 

1 

- 

(a  ’h«  lifting  of  reference  10  has  been  expanded  to  include  vertex-graphs  of 
t.her  c  ombi  nat  ions  of  nodes  of  degree  three  and  four  The  completeness 

ot  Lr.e  atalog  has  be^p  verified  where  possible  by  independent  graph 
construction  methods  and  by  comparison  with  Palaban's  compilations^  *^c 

where  appropriate. 

(b'  Names ,  except  those  in  quotation  marks,  taken  from  Lederberg.10 


l8a)  N.S.  Sridharan,  unpublished  results;  b)  L.  Masinter,  unpublished 
results . 
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supers tompot.  the  program  requests  the  appropriate  CATALOG  entries. 
I*  the  example  (Fig.  2).  the  reduced  degree  list  (0,0.2)  specifies 
vertex-graphs  containing  two  quaternary  nodes  (tefravalent  dihedron). 
The  redeem  degree  list  (0,4 .0)  specifies  regular  trivalent  graphs  of 
four  nodes,  of  which  there  are  two:  4AA  and  4BB  (Taole  III).  Uhen 
only  secondary  nodes  are  present  in  the  reduced  degree  list,  the 
grapn  "Singlering"  (Table  III)  is  utilized. 

Up  t0  th,s  point  the  Program  has  effectively  decomposed 
the  problem  into  a  series  of  eubproblems.  working  down  from  the  total 
pot  of  atoms  through  a  senes  of  partitions  and  subpartitions  to  the 
set  of  possible  vertex-graphs.  In  subsequent  steps  the  vertex-graphs 
are  expanded  to  the  final  structures  by  a  series  of  constructive 
graph  label  I ings  (Table  IV) . 
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Table  IV.  The  Six  Graph  Labelling  Steps  Performed  by  the  Labellinq 
Algorithm  a 

Labelling  Step  Function 

1  Lab -I  Edges  of  Vertex-Graphs  with 
Special  Secondary  Nodes 

2  Label  E^ges  of  Resulting  Graphs  with 
Non- Loop  Secondary  Nodes 

Label  Loops  of  Resulting  Graphs  with 
Loop  Secondary  Nodes 

4  Label  Nodes  of  Cyclic  Skeletons  with  Free 
Valences 

5  Label  Nodes  of  Ciliated  Skeletons  with  Atom  Names 

6  Label  Free  Valences  of  Superatoms  with 
Radicals  (see  Appendix  D) 


fe*  °J  with  Special  Secondary^Nodes^ 

Special  secondary  nodes  are  those  that  will  have  loops  attached.  The 
specification  of  the  possible  attachments  of  the  nodes  to  the  graph 
'a  a  "labelling"  procedure.  This  is  the  first  of  six  such  graph 

labelling  steps  performed  by  tne  program.  (Table  IV).  All  0f  these 
i  ape  I  m  ng  steps  involve  tte  same  combinatorial  problem,  that  of 
associating  a  set  cf  n  labels,  not  necessarily  Distinct,  with  a  set 
Of  Objects  with  arbitrary  symmetry'3  .  The  same  labelling  algorithm 
i5  ut  lized  for  each  of  the  six  labelling  steps.  A  description  of  the 
underlying  mathematics  and  proof  of  completeness  and  irredundancy 
appears  separately'1* 
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Some  aspects  of  the  first  labelling  step  indicate  how  equivalent  labellings  (which 

would  eventually  yield  duplicate  structures)  may  be  avoided  prospectively, 

by  recognition  of  the  symmetry  properties  of  the  graph;  in  the  first  labelling, 

the  vertex-graph.  These  symmetry  properties  are  expressed  in  terms  of  the 

permutation  group  (see  Appendix  A  and  refs.  13  and  14)  on  the  edges  of  the 

vertex-graph.  This  permutation  group,  which  defines  the  equivalence  of  the 

edges,  may  be  specified  in  the  CATALOG  or,  alternatively,  calculated  as 

needed  by  a  separate  part  of  the  structure  generator.  As  subsequent  steps  are 

executed,  a  new  permutation  group  (e.g.,  on  the  nodes  for  labelling  step  four, 

•3 

Table  IV)  is  derived  as  necessary  .  Thus,  only  labellings  which 
result  in  unique  expansions  of  the  structure  are  permitted.  The  reader 
examining  Fig.  2  may  note  that  for  this  simple  example  the  symmetries  of  the 
vertex-graphs  and  subsequent  skeletons  can  be  discerned  easily  by  eye.  For 
example,  all  edges  of  the  tetravalent  dihedron  are  equivalent,  as  are  all  the 
edges  of  the  regular  tri valent  graphs  £A  and  also  4B B .  The  $3BCB  graph 
(Table  II,  Fig.  2)  has  four  equivalent  edges  and  one  other  edge,  and  so  forth, 
in  the  general  case,  however,  the  symmetries  of  the  vertex-graphs  and 
subsequent  expansions  thereof  are  not  always  obvious. 

With  the  group  on  the  edges  specified,  the  labelling  of  the  vertex- 


20 


» 


graphs  with  spec  al  secondary  nodes  is  carried  out.  The  results  of 

this  procedure  for  partitions  containing  loops  are  indicated  in 
F i pure  2. 

r 

'iXiitL  M°-P~ l°°P  Secondary  Nodes^  The  graphs  which  resulted  from 
the  previous  labelling  are  now  labelled  with  the  partitions  of  non-loop 
secondary  nodes  (see  Partitioning  of  Non-Loop  Secondary  Nodes  Among 
Edges,  Append:x  C).  Each  of  the  five  partitions  for  the  tetravalent  dihedron 

in  rig.  2  results  in  c  single  labelling,  as  all  four 

t 

edges  of  the  graph  ore  equivalent.  When  edges  are  distinguishable  there  may 
be  several  ways  to  label  a  graph  with  a  single  partition.  There  are,  for 
example,  for  the  S3BCB  graph,  two  ways  to  label  with  the  partition  3, 0,0, 0,0, 
four  ways  with  the  partition  2, 1,0, 0,0  and  three  ways  with  the  partition  1,1, 1,0,0 
(Fig.  2). 

i^beHmc)  ^226  Secondary  There  remaIn  unassigned  to  the  graphs 

at  this  point  only  secondary'  nodes  which  were  assigned  to  loops.  These 
nodes  are  first  parti i^nc-d  among  the  loops,  (see  Partitioning  of  Loop 
Secondary  Nodes  Among  Loops,  Appendix  C).  For  example, 
following  the  path  from  the  degree  list  (H.0,2)  through  labelling 

* 

with  non-loop  secondary  nodes  (Fig.  2),  there  are  two  ways  of 
labelling  the  two  equivalent  loops  with  four  secondary  nodes.  There 
is  one  way  to  label  the  two  loops  of  the  adjacent  graph  with  three 
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secondary  nodes  and  one  way  of  labelling  the  two  loops  of  each  of  the 

two  remaining  graphs  in  this  section  of  Figure  2  with  two  secondary 

nodes,  in  this  example  (C  U  )  the  loops  in  every  case  are  equivalent 

6  3 

or  there  is  only  one  loop  to  be  labelled.  In  the  yeneral  case  loops 
may  not  be  equivalent,  resulting  in  a  greater  number  of  ways  to  label 
loops  with  a  given  partition  of  secondary  nodes. 

The  previous  labelling  steps  specified  the  number 
of  secondary  nodes  on  each  edge  of  and  loop  attached  to  the  vertex- 
graphs.  All  atoms  in  the  original  6uperatompot  are  thus  accounted 
'or.  A  representation  of  the  result  is  the  cyclic  skeleton,  where 
nodes  and  iheir  connections  to  one  another  are  specified.  (These 
skeletons  begin  to  resemble  conventional  chemical  structures.) 

Valences.  Tfge  nodes  in  a  cyclic  skeleton  are 
then  labelled  uith  free  valences,  yielding  ciliated  skeletons.  This 
labelling  is  trivial  in  the  example,  as  all  atoms  are  of  the  same 
valence  (four)  (Figure  2).  Free  valence  labelling  is  performed  with 
knowledge  of  how  many  atoms  of  each  valence  were  present  in  the 
original  superatompot ,  but  independent  of  the  identities  of  the 
atoms.  The  combinatorial  complexity  of  this  labelling  problem  follows 
from  the  possible  occurence  of  atoms  with  differing  valences.  In  the 
general  case  there  may  be  several  ways  to  perform  this  labelling  on  a 


single  cyclic  skeleton, 
way. 


whereas  in  the  C  U  example  there  is  only  one 
6  3 


^abcJJNng  mth  Atom  Names.  The  nodes  of  a  ciliated  skeleton  are 
then  labelled  with  atom  names  to  yield  the  r ing-6uperatom (s) .  Again 
this  labelling  is  trivial  in  the  example,  as  only  one  type  of  atom  is 
present  (carbon),  yielding  in  each  case  onli  a  single  6uperatom  (Fig. 
2).  If  there  is  more  than  one  type  of  atom  wit!  the  same  valence 
(e.g.,  silicon  and  carbon),  the  labelling  problem  is  more  complex. 
Each  node  of  appropriate  valence  ma  be  labelled  with  either  type  of 
atom.  Duplicate  structures  are  avoided  by  calculations  involving  the 
group  pertaining  to  the  set  of  node6  of  equal  valence. 


PART  C.  Acyclic  Generator. 

The  superatom  partition  expanded  in  the  example  had  no  atoms  assigned  to 
acyclic  chains  (remaining  pot).  The  set  of  ring-superatoms  on  completion  of 
Part  B,  above,  thus  yields  the  set  of  36  structures  on  placement  of  a 
hydrogen  atom  on  each  free  valence  (Fig.  2).  If  the  superatom  partition 
(partitions  2-11,  Table  .11)  contained  more  than  one  superatompot  or 
any  atoms  in  the  remaining  pot,  the  acyclic  generator  must  be  used  to 
connect  the  segments  of  the  structure  in  all  ways.  This  procedure  is 
described  in  detail  in  Appendix  D. 
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DISCUSSION 


X6H8.  The  example  (Fig.  2)  has  considered  only 

expansion  of  a  single  superatom  partition.  It  might  be  instructive 
for  the  reader  t0  attempt  to  generate  all.  or  at  least  the  remaining, 
structures  for  C^.  The  number  of  solutions  is  presented  in  a 

subsequent  section.  If  the  algorithm  as  outlined  in  Figure  2  is 

followed,  it  is  suggested  that  the  initial  superatom  partitions  in 

Table  II  be  examined  carefully.  These  partitions  yield  some 

indication  of  the  types  of  structures  which  will  result  from  each 

partition.  For  example,  partition  4,  C  U  in  a  single  superatompot, 

3  3 

plus  three  carbons  in  the  remaining  pnt,  should  yield  all  structures 
containing  a  three-membered  ring  possessing  two  double  bonds  or  a 
triple  bond.  As  there  are  only  two  free  valences,  the  remaining 
atoms  can  be  in  a  single  chain  (?s  a  propyl  or  iso-propyl  radical)  or 
as  a  methyl  and  an  ethyl  group,  but  not  as  three  methyl  groups. 

Completeness  and  Irredundancy.  Although  a  mathematical  proof  of  the 
completeness  and  irredundancy  of  the  method  exists'5  there  is  no 
guarantee  that  the  implementation  of  the  algorithm  in  a  computer 
program  maintains  these  desired  characteristics.  Confidence  in  the 
completeness  and  irredundancy  of  a  program  of  this  complexity  can  be 
engendered  in  the  following  ways: 
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1)  Verification  of  the  program’s  performance  by  another,  completely 

independent  approach.  An  independent  method  has  been  developed  which 
enumerates,  but  does  not  construct,  all  isomers  of  compositions  containing 
C,H,N,  and  0  .  It  is  interesting  that  the  program  for  simple  counting 

of  the  solutions  is  significantly  slower  than  construction  of  all  of  the 
solutions,  despite  some  effort  to  improve  the  efficiency  of  the  former 
program.  Thus,  due  to  limitations  of  computer  time,  we  have  been  limited 
to  comoositions  containing  only  5  or  fewer  non-hydrogen  atoms.  For  these 
cases,  however,  the  numbers  of  isomers  obtained  by  both  programs  agree. 

9d 

Balaban  has  presented  lists  of  isomers  of  C^H^,  C^Hg,  Cj_Hg  an^  ’ 

These  lists  were  derived  from  his  tables^  of  graphs  of  degrees  2-h  and 
orders  (numbers  of  nodes)  1-5*  Although  we  agree  with  his  lists  of 
hydrocarbon  isomers,  the  list  of  isomers  of  C^H^O  is  incomplete.  The 
structure  generator  provides  62  structures  (as  opposed  to  59)-  The  three 
missing  structures  are: 

9d 

These  structures  should  have  been  produced  following  Balaban 's  method 
The  fact  that  they  were  not  points  out  the  difficulties  inherent  in  any 
procedure  for  isomer  generation  in  which  manual  steps  are  involved  (see  below). 

2)  Testing  by  manual  generation  of  structures.  Several  chemists,  all 
without  knowledge  of  the  algorithm  described  above,  have  been  given  several 
test  cases,  including  C^U^,  from  which  structures  were  generated  by  hand. 
Familiarity  with  chemistry  is  no  guarantee  of  success,  as  evidenced  by  the 
performance  of  three  chemists  for  the  superficially  simple  case  of 

CgU  (CgHg,  Table  V). 
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Table  V.  Performance  of  Three  Chemists  in  Manual  Generation 
of  Isomers  of  C^g  (C^g).  There  are  159  Isomers. 


Number  Generated 

Type  of  Error 

Chemist  1 

161 

4  duplicates;  4  omissions 

2  with  7  carbon  atoms. 

Chemist  2 

168 

16  duplicates;  7  omissions 

Chemist  3 

160 

2  duplicates;  1  omission 

*  One  PhD  and  two  graduate  students. 


This  example  indicates  that  for  more  than  very  trivial  cases, 
it  is  extremely  difficult  to  avoid  duplicates  (tricyclics,  for 
example,  are  difficult  to  visualize  when  testing  for  duplicates)  and 
omissions.  Omissions  appear  to  result  from  both  carelessness  and 
neglect  of  ring  systems  that  are  implausible  or  unfamiliar.  The 
program  seems  better  at  testing  the  chemist  than  vice  versa.  In 
every  instance  of  manual  structure  generation,  no  one  has  been  able 
to  construct  a  legal  structure  that  the  program  failed  to  construct. 
No  one  has  been  able  to  detect  an  instance  of  duplication  by  the 
program.  This  performance  builds  some  confidence,  but  manual 
verification  of  more  complicated  cases  is  extremely  tedious  and 
difficult.  Isomers  for  many  empirical  formulae  have  been  generated, 
and  some  results  are  tabulated  in  Table  VI.  The  choice  of  examples 
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has  been  motivated  by  a  desire  to  test  all  parts  of  the  program  where 

errors  may  exist  while  keeping  the  number  of  isomers  small  enough  to 

al'ow  verification.  In  this  manner  all  obvious  sources  of  error  have  been  checked, 

tor  example,  construction  of  loops  on  loops,  multiple  types  of  atoms  of  the  same 

valence  (e.g. ,  Cl,  Br,  I)  and  examples  containing  atoms  of  several 

different  valences  including  penta-  and  hexavalent  atoms. 

3)  Varying  the  order  of  generation.  The  structure  of  the 
program  permits  additional  teats  by  doing  some  operations  in  a 
different  order.  For  example,  one  var ia t ion  al I  owed  is  to  leave 
hydrogens  associated  with  the  atoms  in  each  partition  rather  than  to 
strip  them  away  initially  and  place  them  on  the  remaining  free 
valences  in  the  last  step.  Each  such  test  has  resulted  in  the  same 
6et  of  isomers. 

4)  Using  Polya  enumeration  at  the  various  labelling  steps 
of  the  procedure  to  verify  the  correctness  of  6ub-parts  of  the 
program.  Using  various  combinatorial  formulae,  one  can  insure  that 
the  results  of  at  least  parts  of  the  program  are  consistent  with 
independent  calculations.  Th i 6  approach  was  used  extensively  in  the 
development  of  the  I abe I  I irq  algor i thm. 
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In  summary,  the  verification  procedures  utilized  have  all  indicated 
absence  of  errors  in  the  computer  implementation  of  the  algorithm. 
Also,  there  is  no  clear  reason  why  generation  of  larger  sets  of 
isomers  should  not  also  proceed  correctly.  The  final  verdict 
however,  must  await  development  of  new  mathematical  tools  for 
verification  by  enumeration  Uee  above)  or  an  alternative  algorithm. 
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Table  VI.  The  Number  of  Isomers  for  Several  Empiricai'Formuiae 


Emp i r i ca I 
Formu  I  a 


Example  Number  of  Isomers  Manual ly  Verified’ 

Compound 


C  H 

6  S 

benzene 

217 

yes 

C  H 

6  8 

1,3-cyc lohexadiene 

15S 

yes 

C  H 

6  10 

eye  1 ohexene 

77 

yes 

C  H 

6  12 

eye  1 ohexane 

25 

yes 

C  H 

6  14 

hexane 

5 

yes 

C  H  0 

u  6 

pheno 1 

2237 

no 

C  H  0 

S  10 

eye  1 ohexanone 

747 

no 

C  ri  0 

6  12 

2-hexanone 

211 

yes 

C  H  N 

3  4  2 

pyrazole 

155 

no 

C  ri  N 

3  S  2 

2-pyrazo 1 ine 

136 

yes 

C  H  N 

3  8  2 

tetrahydropyazole 

62 

no 

C  h  N 

3  10  2 

propy lenediamine 

14 

yes 

CMP 

4  9  1 

(pentavalent  P) 

110 

no 
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Copst,rqjots.  The  structure  generator  is  designed  to  produce  a  list  of  all 

possible  graph  isomers  (Appendix  B).  This  list  contains  many  structures  whose 

existence  seems  unlikely  based  on  present  chemical  knowledge.  In  addition, 

the  program  may  be  called  on  to  generate  possible  structures  for  an  unknown 

in  the  presence  of  a  body  of  data  on  the  unknown  which  specify  various 

features,  e.g.,  functional  groups)  of  the  molecule.  In  such  instances 

mechanisms  are  required  for  constraining  the  generator  to  produce  only 

structures  conforming  to  specified  rules.  The  implementation  of  the 

acyclic  generator  possessed  such  a  mechanism  in  the  form  of  GOODLIST 

3 

(desired  features)  and  BADLIST  (unwanted  features)  which  could  be 
utilized  during  the  course  of  structure  generation. 

The  complete  structure  generator  is  less  tractable.  As  in  prospective 
avoidance  of  duplicate  structures,  it  is  important  that  unwanted  structures,  or 
portions  thereof,  be  filtered  out  as  early  in  the  generation  process  as 
possible.  It  is  relatively  easy  to  specify  certain  general  types  of  constraints 
in  chemical  terms,  for  example,  the  number  of  each  of  various  types  of  rings 
or  ring  systems  in  the  final  structure,  ring  fusions,  functional  groups,  sub¬ 
structures  and  so  forth.  It  is  not  always  so  easy  to  devise  an  efficient  scheme 
for  utilizing  a  constraint  in  the  algorithm,  however.  As  seen  in  the 
above  example  (Fig.  2)  the  expanded  superatom  partition  results  in  what  would 
be  viewed  by  the  chemist  as  several  very  different  ring  systems. 
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The  design  of  the  program  facilitates  some  types  of  constraints.  For 
example,  the  progri  n  may  be  entered  at  the  level  of  combining  superatoms  to 
generate  structures  from  a  set  of  known  sub-structures.  If  additional 
atoms  are  present  in  an  unknown  configuration,  they  can  be  treated  as  a 
separate  generation  problem,  the  results  of  which  are  finally  combined  in  all 
ways  with  the  known  superatoms.  This  approach  will  not  form  additional  two- 
connected  structures,  however.  Constraints  which  disallow  an  entire 
partition  may  be  easily  included.  For  example,  it  is  possible  to  generate 
only  pure  ring  isomers  by  "turning  off"  the  appropriate  initial  superatom 
partitions. 

Much  additional  work  remains,  however,  before  a  reasonably  complete  set  of 
constraints  can  be  included.  The  implementation  of  each  type  of  constraint 
must  be  examined  and  tested  in  detail  to  ensure  that  the  generator  remains 
thorough  and  irredundant. 

CONCLUSIONS 

>■ - —  ~  " 

The  algorithm  summarized  in  this  paper  permits  the  substantial  realization  of 
the  graphical  structures  that  constitute  the  domain  of  organic  chemistry.  The 
version  of  the  algorithm  presented  here  ignores  the  tetrahedral  symmetry 
of  the  valences  of  the  carbon  atom.  However,  the  topological  framework 
••eadily  admits  of  systematic  tests  for  asymmetric  centers  which  can  then  be 
assigned  to  the  dichotomous  categories  of  the  alternating  group  A^.  This 
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framework  also  provides  a  simple,  systematic  weighting  of  radicals  for 
assignment  of  precedence  that  proves  to  be,  if  anything,  even  more 
straightforward ,  comprehensive  and  free  from  ambiguity  than  the  Cahn- 
Ingold-Prelog  conventions^. 

The  mathetmatical  framework  of  our  analysis  is  a  mapping  of  chem¬ 
ical  bonds  onto  the  edges  of  topological  graphs.  This  simplification 
can  lead  to  disparities,  for  example  in  the  description  of  coordination 
complexes,  the  bonds  of  which  are  non-equivalent.  The  symmetries  of 
such  complexes  are  similar  to  those  of  certain  superatoms,  suggesting  an 
obvious  and  easy  way  to  extend  the  system.  Likewise,  the  system  does 
not  now  accommodate  isomerism  based  on  steric  hindrance,  or  the  associa¬ 
tion  of  molecules  by  secondary  forces,  or  by  non-covalent  constrants. 

For  example,  from  a  topological  standpoint,  threaded  molecules,  or 
catenanes ,  are  disjoint  graphs.  Nor  do  we  attempt  to  display  the  geo¬ 
metric  conformations  of  molecules:  indeed,  some  topologically  plausible 
structures  may  be  chemically  unrealizable. 

Conversely,  implausible  constructs,  such  as  carbon  atoms  possessing 

it  •  20 

"inverted"  tetrahedral  geometry  may  become  reality  by  empirical  dis¬ 
covery.  i he  constraints  on  chemically  plausible  structures  depend  on 

(x'-j)  R.  S.  -ahn ,  C.  K.  Ingold,  and  V.  Prelog,  Angew.  Chem.  Internat.  Ed., 
b,  38b  (  1966).  * - - - - 

(a)  K.  B.  Wiberg  and  G.  J.  Burgmaier,  J.  Amer.  Chem.  Soc. ,  94, 

7396  (  1972);  - - - 

(b)  K.  B.  Wiberg,  G.  J.  Burgmaier,  K.  Shen ,  S.  J.  LaPlaca,  W.  C. 
Hamilton,  and  M.  D.  Newton,  J.  Amer.  Chem.  Soc.,  9£,  7402  (1972). 

the  domain  specified  by  the  chemist.  A  DENDRAL3  system  for  molecular 
structure  elucidation  (based  on  the  structure  generator  described  in 
this  work)  of  molecules  in  frozen  hydrogen  matrices  would  have  differ¬ 
ent  constrants  from  a  version  useful  to  biochemists. 
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Chemists  hitherto  have  been  able  to  explore  the  de  facto  boundaries  of  their 

domain  without  explicit  maps.  The  exhaustive  and  efficient  study  of  all 

possible  structures  con  now  be  facilitated  with  the  assistance  of  computer 

programs  that  can  help  assure  that  no  possible  construction  has  been 
21 

overlooked 
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A|  pi'ndix  A. _ ETtiu i vn lencc  Classes  and  Finito  Pernutalion  Groups. 

ii.o  - Per s  of  a  jet  of  possible  isomers  may  be  defined  to  be 
it.uivji.nt  •  <  a  specified  transformat  ion  of  one  member  causes  it  to 
be  supti  ,.u  able  upon  another  member  of  the  set.  For  example,  there 
arc  fifteen  possible  nays  of  attaching  two  chlorine  and  four  hydrogen 
atoms  to  a  benzene  ring  (Chart  III). 


CM  art  IE 
Cl 


Equivalence  Ctoss 


if  rotations  by  multiples  of  60  degrees  are  specified  as  allowed 
transformations,  the  fifteen  structures  fall  logically  into  three 
classes,  termed  "equivalence  classes"  (Chart  III).  Within  each 
equivalence  class  structures  may  be  made  superimposable  by  the 
rotational  transformation.  If  one  element  (In  this  case  a  molecular 
structure)  is  chosen  from  each  equivalence  class,  the  complete  set  of 
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Possible  structures  i 6  determined,  without  duplication.  It  is  the 
task  of  the  labelling  algorithm  to  produce  one  and  only  one  graph 
label  I  -ng  corresponding  to  one  member  of  each  equivalence  class. 

The  set  of  transformations  which  define  an  equivalence  class  is  termed  a 
"finite  permutation  group."  This  permutation  group  may  be  calculated  based 
on  the  symmetry  properties  of  a  graph  (or  chemical  structure  in  the  example 
of  Chart  III).  This  calculation  provides  the  mechanism  for  prospective 
avoidance  of  duplication.  These  procedures  are  described  more  fully  in  the 
accompanying  paper 


Appendix  B.  Isomerism  and  Symmetry. 


Appendix  A  introduced  the  concept  of  equivalence  classes  and  finite  permutation 
groups.  The  selection  of  transformation  (Appendix  A)  directs  the  calculation  of 
the  permutation  group  and  thus  defines  the  equivalence  classes.  Different  types 
of  transformation  may  be  allowed  depending  on  the  symmetry  properties  of  the  class 
of  isomers  considered.  This  Appendix  discusses  several  cf  the  possible  types  of 
isomerism,  most  of  which  are  familiar  to  chemists.  The  reader  seeking  a  more 
thorough  discussion  of  some  types  of  isomerism  discussed  below  is  referred  to  an22 
exposition  of  molecular  symmetry  in  the  context  of  chemistry  and  mathematics. 

Isomers  are  most  often  defined  as  chemical  structures  possessing  the  same 
empirical  formula.  Different  concepts  of  symmetry  give  rise  to  different 
classes  of  isomers,  some  of  which  are  described  below. 

Permutational  Isomers.  Permutatior.al  isomers  are  isomers  which  have  in 
common  the  same  skeleton  and  set  of  ligands^They  differ  in  ttjg  distribution  of 
ligands  about  the  skeleton.  Gillespie  et  al.  and  Klemperer  have  used  the 
concept  of  permutational  isomers  to  probe  into  unimolecular  rearrangement  or 
isomerization  reactions. 

22 

Stereoisomers.  'Jgi  et  al.  have  defined  the  "chemical  constitution"  of  an 
ator^  to  be  its  bonds  and  bonded  neighbors.  Those  permutational  isomers  which 
differ  only  by  permutations  of  ligands  at  constitutionally  equivalent  positions  form 
the  class  of  stereoisomers. 

Isomers  Under  Rigid  Molecular  Symmetry.  If  one  perceives 
molecular  structures  as  having  rigid  skeletons,  the  physical 
rotational  (three  dimensional)  symmetries  and  transformation;  may  be 
readily  defined.  Each  transformation  causes  each  atom  (and  bond)  to 


(22)  I.  Ugi.  0.  Harquarding,  H.  Klusacek,  G.  Gokel,  and  P.  Gillespie, 

Chen.  internet.  Edit. .  9,  703  (1970). 

(23)  P.  Gillespie,  P.  Hoffman,  H.  Klusacek,  D.  Harquarding,  S. 
Pfohl,  F.  Ramirez,  E.  A.  Tsoi  is,  and  I.  Ugi,  Angeu,  Chen, 
internet.  Edit.,  10,  G87  (1971), 

"  '  n  '  r  ~r~  V*V 

(24)  (a)  U.  G.  Klemperer,  J.  Amer,  Chen,  Soc, .  94,  GS40  (1972); 

(b)  U.  G.  Klemperer,  id  id.  p.  8360; 

(c)  W.  G.  Klemperer,  ibid,  95,  380  (1973); 

(,d)  W.  G.  Klemperer,  ibid,  p.  2105. 
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occupy  the  position  of  another  or  same  atom  (and  bond)  so  that  the 
rotated  structure  can  physically  occupy  its  former  position  and  at 
the  same  time  be  indistinguishable  from  it  in  any  way.  This  is  the 
most  familiar  form  of  symmetry.  Linder  this  type  of  symmetry 
con  formers  are  distinguishable  and  belong  in  distinct  equivalence 
classes.  Every  transformation  is  orthogonal  and  preserves  bond 
angles  and  bond  lengths  as  uell  as  maintaining  true  chirality. 


If  one  allows  other  orthogonal  transformations  that  alter  chiral 
properties  of  structures,  equivalence  classes  result  that  treat  both 
the  left-handed  and  right-handed  forms  of  chiral  molecules  to  be  the 
"same".  Thus  a  "mirror  image"  transformation  when  sui tably  def ined 
permits  the  left-handed  form  to  exactly  superimpose  the  right-handed 
form  and  vice  versa. 


Isomers  Under  Total  holecular  Symmetry.  If  in  addition  to  the  above 
mentioned  rigid  molecular  transformations  one  recognizes  the 

flexional  movements  of  a  nonrigid  skeleton,  a  dynamic  symmetry  group 
may  be  defined.  Under  this  definition,  different  conformers  now  are 
grouped  together.  Thus  the  "chair"  and  "boat"  conformations  of 
cyclohexane  belong  to  the  same  equivalence  class  under  dynamic 
symmetry.  The  permutation  group  of  skeletal  flexibility  is 

computable  separately  and  independently  of  rigid  molecular  symmetry. 
One  can  then  view  total  molecular  symmetry  as  the  product  of  the  two 
finite  permutation  groups. 


Isomers  Under  Connectivity  Summetru.  The  concept  of  connectivity 
symmetry  was  introduced  previously  (HETHOD  section).  Every 
permutation  of  atoms  and  bonds  onto  themselves  is  a  symmetry 
transformat ion  for  connectivity  symmetry  if, 


a)  each  atom  is  mapped  into  another  of  like  species,  e.g.,  N  to 
N,  C  to  C,  0  to  0,  and 


b)  for  every  pair  of  atoms,  the  connect ivi  ty  (none,  single, 
double  ,  triple,  ...)  is  preserved  in  the  mapping,  i.e.  the  the 
connectivity  of  the  two  atoms  is  identical  to  the  connectivity 
of  the  atoms  they  are  mapped  into. 


One  can  readily  recognize  that  transformat ions  as  defined 
automat  teal ly  preserve  the  valence  and  bond  distribution  of  every 
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atom.  It  is  very  probable  that  readers  accustomed  to  three 
dimensional  rotational  and  ref  lectional  symmetries  will  tend  to 
equate  them  with  the  symmetries  of  connectivity.  It  ie  emphasized 
again  that  connectivity  symmetry  does  not  consider  bond  lengths  or 
bond  angles,  and  it  includes  certain  transformations  that  are 
conceivable  but  have  no  physical  interpretat  ion  save  that  of 
permuting  the  atoms  and  bond6. 


Appendix  C 


Superatom  Partitions.  The  first  step  is  to  replace  the  hydrogen  count  with  the 
degree  of  unsaturation.  The  number  of  unsaturations  (rings  plus  double  bonds)  is 
determined  from  the  empirical  formula  in  the  normal  way,  as  given  in  equation  1. 

n 

U  =  1/2  (2+E  (i-2)a.)  0) 

i=l  ' 

U  =  unsaturation 
i  =  valence 

n  =  maximum  valence  in  composition 
a.  =  number  of  atoms  with  valence  i 


If  the  unsaturation  count  is  zero,  the  formula  is  passed  immediately  to  the 
acyclic  generator.  Specifying  the  unsaturations  as  U's,  the  example  C^Hg 
becomes  C,U0  (hydrogen  atoms  are  omitted  by  convention). 

6  o 


There  are  several  rules  which  are  used  during  the  partitioning  scheme,  as 
follows: 


I.  The  resulting  formula  is  stripped  of  other  univalent  atoms  (e.g., 
chlorine)  as  such  atoms  cannot  be  part  of  two-connected  ring- 
superatoms.  These  univalent  atoms  are  relegated  to  the  pot  of 
remaining  atoms. 

II.  The  remaining  pot  in  a  given  partition  (those  atoms  not  allocated  to 
superatompots)  can  contain  no  unsaturations.  Thus  alM  rings  and/ or 
multiple  bonds  will  be  gonerated  from  the  superatompots. 

III.  It  follows  that  every  superatompot  in  the  partition  must 
contain  at  least  two  atoms  of  valence  two  or  higher  plus  at  least 
one  unsaturation.  If  there  are  no  unsaturations  then  no  rings  could 
be  built.  In  addition,  an  unsaturation  cannot  be  placed  on  a 
single  atom.  This  rule  defines  the  minimum  number  of  atoms  and 
unsaturations  in  a  superatompot. 
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The  maximum  number  of  unsatu rations  in  a  superatompot  is  given  by 
Equation  2.  Superatoms  mus.'  possess  at  least  one  free  valence  12  ,  so 

that  superatompots  with  no  free  valences,  e.g.,  0.?U.  or  C-IL,  are  not 
allowed,  unless  the  superatompot  contains  all  atoms  in  the  empirical 
formula  (since  no  univalents,  and  thus  no  hydrogens,  are  allowed  in  a 
superatompot,  this  is  indeed  a  rare  occurance.) 

Umax=1/2'nE  (i-2*0;)  (2) 

1=0 

^max  =  max'mum  unsaturation  of  a  superatompot 


n  -  maximum  valence  in  composition 

i  =  valence 

a.  =  number  of  atoms  with  valence  i 


V. 


The  maximum  number  of  superatompots  for  a  given  formula  is  defined  by 
equation  3. 


S 

max 


n 

=  1/2  E  a. 

•o' 

i=2 


(3) 


n  =  maximum  valence  in  composition 

■\nax  =  max'mum  number  of  superatompots  in  a  superatom  partition 
a.  =  number  of  atoms  with  valence  i 

note:  the  summation  is  over  all  atoms  of  valence  )  2;  univalents  are 
not  considered. 


Rules  l-V  define  the  allowed  partitions  of  a  group  of  atoms  into  superatompots. 
These  rules  do  not,  however,  prevent  generation  of  equivalent  partitions,  which 
would  eventually  result  in  duplicate  structures.  By  defining  a  canonical 
ordering  scheme  to  govern  partitioning,  we  prevent  equivalent  partitions.  One 
such  canonical  ordering  is  as  follows: 

Canonical  Ordering  for  Partitioning. 

a.  Partition  in  order  of  increasing  number  of  superatompots. 
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b.  Fcr  each  entry  in  each  part  of  (a),  partition  in  order  of 
decreasing  sice  of  superatompot  by  allocation  of  atoms  one  at  a 
time  to  the  remaining  pot. 


c.  Each  individual  partition  containing  two  or  more 
superatompots  must  be  in  order  of  equal  or  decreasing  size  of 
the  superatompot.  In  other  words,  the  number  of  atoms  and 
unsaturations  in  superatompot  n+1  must  be  equal  to  or  less  than 
the  number  in  superatompar t  n.  The  program  notes  the  equality 
of  superatompots  in  a  partition  to  avoid  repetition. 


Thj  application  of  rules  I-V  is  best  illustrated  through  reference  to 
the  example  of  C  U  .  The  maximum  number  of  superatompots  for  this 
6  3 

example  is  three  (Equation  3).  There  is  one  way  to  partition  C  U 

G  3 

into  one  superatompot  with  no  remaining  pot,  partition  1,  Table  II. 
Subsequent  assignment  of  carbon  atoms  one  at  a  time  to  the  remaining 
pot  results  in  partitions  2-4,  Table  II.  The  next  partition 
following  the  sequence  1-4  would  be  CU  with  C  assigned  to  the 

2  3  4 

remaining  pot.  This  partition  is  forbidden  as  C  U  has  no  free 

T  2  3 

valences.  The  three  ways  to  partition  CU  into  two  superatompots 

G  3 

are  indicated  along  with  the  corresponding  parti tions  fol lowing 
assignment  of  atoms  to  the  remaining  pot,  as  partitions  5-10,  Table 
II.  There  is  only  one  unique  way  of  partitioning  C  U  into  tnree 

G  3 

superatompots.  partition  11,  Table  II. 


Calculation  of  Free  Valence.  The  expression  for  the  free  valence  of 
a  superatompot  is  given  by  equation  4. 
n 

FV  -  (2  +2  ( i -2) a  )-2U  , 

i-3  i 

U  -  unsaturation  of  superatompot 
i  •  va I  once 

n  =  maximum  valence  in  composition 
a  -  number  of  atoms  with  valence  i 
i 

FV  «  free  valence 
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JPorj  i  Non  i ng  of  Froe  Valence.  Because  ring-superatoms  are  tuo- 
connec ted  structures  two  valences  of  each  atom  of  a  superatompot  must 
be  used  to  connect  the  atom  to  the  r ing-superatom.  Thus  no  free 
valences  can  be  assigned  to  bivalent  nodes  in  the  valence  list,  a 
maxi  r.)um  of  one  to  each  tri  valent,  a  maximum  of  two  to  each 
t  e  trava  lent,  and  so  forth.  The  example  (Fig.  2)  is  further 

simplified  in  that  there  are  only  tetravalent  nodes  in  the  valence 
list.  Inclusion  of  trivalent  nodes  (e.g.,  nitrogen  atoms)  merely 
extends  the  number  of  possible  partitions.  The  free  valences  are 
partitioned  among  the  tetravalent  nodes  in  all  ways,  as  illustrated 
in  Figure  2.  It  is  important  to  note  that  removal  of  atom  names 
makes  all  n-valent  (n»2  or  3  or  ...)  nodes  in  the  valence  list 
equivalent  at  this  stage.  Thus  the  partitions  (of  eight  free 

vajences  among  six  tetravalent  nodes)  222200,  222020,  222002,  . . 

002222  are  all  equivalent.  Only  one  of  these  parti  tions  is 
considered  to  avoid  eventual  duplication  of  structures. 


Calculation  of  Loops.  There  are  several  rules  which  must  be 
followed  in  consideration  of  loop  assignment  to  ring-superatoms.  The 
minimum  (IllNLOOPS)  and  maximum  (MAXLOOPS)  numbers  of  loops  for  a 
given  valence  list  are  designated  by  equations  5  and  6. 


MINLOOPS 

MAXLOOPS 


=  max  { 0  ,  a  +  l/2(2n  -  E  ja.)} 
1  j-2  J 

n 

=  min  {  a_  ,  1/2  E  (j-2)  a.} 

2  j=4  J 


MINLOOPS 

MAXLOOPS 

a. 


n 


=  minimum  number  of  loops 
=  maximum  number  of  loops 
=  number  of  nodes  with  degree  j 
=  degree 

=  highest  degree  in  list  (a^  /  O) 


The  form  of  the  equations  results  from  the  following  considerations: 

1)  Only  secondary  nodes  may  be  assigned  to  loops.  Nodes  of 


r 


higher  degree  will  always  be  in  the  non- loop  portion  of  the 
r i ng-superatom. 


w)  A  loop,  by  definition,  must  be  attached  by  two  bonds  to  a 
single  node  in  the  resulting  r i ng-superatom.  The  loop  cannot 
be  attached  through  the  free  valences.  Thus  the  degree  list 
must  possess  a  sufficient  number  of  quaternary  or  hiaher  degree 
nodes  to  support  the  loop(s).  w 


3)  Each  loop  must  have  at  least  one  secondary  node, 
the  reason  flAXLOOPS  is  restricted  to  be  at  most  the' 
secondary  nodes  in  the  degree  list  (Equation  G). 


which  is 
number  of 


A)  There  must  be  available  one  unsaturation  for  each  loop 
(this  is  implicit  in  the  calculation  of  OINLOOPS  and  flAXLOOPS) 

36  each  loop  effectively  forms  a  new  ring. 

» 

Eflrtitjoiiinfl  cT  Secondary  Nodes  between  Loops  and  Non-Loops.  For  each  of 
the  possible  numbers  of  loops  (0,  1,  . ..)  the  secondary  nodes  are  removed  from 
the  degree  list  and  partitioned  among  the  loops,  remembering  that  the  loops  are 
t  at  Present  indistinguishable  and  each  loop  must  receive  at  least  one  secondary 

node.  In  the  example  (Fig.  2),  starting  with  the  degree  list  (4,  0,  2),  there  are 
three  ways  of  partitioning  the  four  secondary  nodes  among  two  loops  and  the 
remaining  non-lcop  portion.  Removal  of  the  four  secondary  nodes  from  the 
degree  list  and  assignment  of  two,  three  or  four  of  them  to  two  loops  results  in 
the  list  specified  in  Figure  2  as  the  "reduced  degree  list".  Specification  of  two 
loops  transforms  the  two  quaternary  nodes  in  the  degree  list  into  two  secondary 
nodes.  This  results  from  the  fact  that  two  valences  of  a  quaternary  or  higher 
degree  node  must  be  used  to  support  each  loop.  These  are  "special"  secondary 
(or  higher,  for  atoms  with  valence  )  4)  nodes,  however,  as  these  particular  nodes 
will  have  loops  attached  as  the  structure  is  built  up.  Thus,  in  the  example, 
t  any  secondary  nodes  which  are  found  in  the  reduced  degree  list  will  have  a  loop 

attached  in  a  subsequent  step.  The  degree  list  (4,  0,  2)  thus  becomes  the 
reduced  degree  list  (2,  0,  0)  In  the  partition  specifying  two  loops  (Fig.  2). 
Similarly,  the  partition  of  one  loop  for  the  degree  list  (3,  2,  1)  results  in  a 
reduced  degree  list  of  (1,  2,  0)  with  the  three  original  secondary  nodes 
}  partitioned  among  loop  and  non-loop  portions  (Figure  2). 

If,  after  the  first,  second,  ...  nth  loop  partition,  there  remain  one 
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or  more  quaternary  or  higher  degree  nodes  in  the  reduced  degree  list, 
the  list  must  be  tested  again  for  the  possibility  of  additional 
loops.  Each  loop  partition  ui  1 1  result  in  an  additional  eet  of 
structures.  The  second  pass  ui  1 1  yield  those  structures  possessing 
loops  on  loops,  and  so  forth.  One  such  superatom  which  would  be 
generated  in  this  manner  from  a  composition  of  (at  least)  C  U  is  15. 

6  5 


C«C=C»C,,C"C 

15 


Partitioning  of  Non-Loop  Secondary  Nodes  among  Edges.  The  secondary  nodes 
which  were  not  assigned  to  loops  {"non-loop  secondary  nodes")  are  partitioned 
among  the  edges  of  the  graphs  after  labelling  with  special  secondary  nodes,  or 
loops.  Loops  are  not  counted  as  edges.  There  are,  for  example,  five  ways  to 
partition  four  non-loop  secondary  nodes  among  the  edges  of  the  vertex-graph 
possessing  two  quaternary  nodes  (Fig.  2). 

Partitioning  of  Loop  Secondary  Nodes  among  Loops.  This  partitioning  step  is 
carried  out  assuming  indistinguishability  of  the  loops.  Each  loop  must  receive 
at  least  one  secondary  node,  which  limits  the  number  of  possible  partitions. 
Results  art  presented  in  Figure  2. 


Appendix  D  -  Acyclic  generator 


A  method  of  construction  of  structures  similar  to  the  method  for  acyclic 
isomers  is  utilized  to  join  multiple  ring-superatoms  and  remqjnina  ctfoms. 

The  DENDRAL  algorithm  for  construction  of  acyclic  isomers  *  ’ 

relied  on  the  existence  of  a  unique  central  atom  (or  bond)  to  every  molecule. 

The  present  acyclic  generator  uses  the  same  idea.  The  present  algorithm,  though 
simpler  in  not  having  to  treat  interconnection  of  atoms  or  ring-superatoms  through 
multiple  bonds,  is  more  complex  because  of  the  necessity  to  deal  with  the 
symmetries  of  the  ring-superatoms. 

Dl.  Method  for  the  case  with  even  number  of  total  atoms. 

The  superatom  partition  C  U  /C  U  /-/C  (partition  7,  Table  II  and 

2  2  2  1  2 

Figure  2)  will  be  used  here  to  illustrate  this  procedure.  The 

superatompots  C  U  and  C  U  have  exactly  one  possible  r ing-superatom 
2  2  2  1 

for  each  (see  Table  VII). 


Table  VII. 

Superatompot  Superatom 

C  U  -t=C- 

2  2 

C  U  >C=C< 

2  1 


Thus  acyclic  structures  are  to  be  built  with  -C^C-  ,  >C=C<  and  tuo 
c  s. 


There  are  an  even  number  of  atoms  and  ring-superatoms.  The 
structures  to  be  generated  fall  into  two  categories;  (a)  those  with 
bond  centroid;  (b)  those  with  an  atom  centroid. 


(25)  B.  G.  Buchanan,  A.  H.  Duffield,  and  A.  V.  Robertson,  in  "Mass 
spectrometry.  Techniques  and  Applications,"  G.  U.  A.  Milne,  ed. ,  John 
Uiley  and  Sons,  Inc.,  1971,  p.  121. 


\ 
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Category  A.  POND  C^NTROIO  (see  Fig.  3) 

Step  1.  Partition  into  Two  Parts. 

The  atoms  and  r ing-superatoms  in  the  list  of  superatoras  are 
partitioned  into  two  parts,  with  each  part  having  exactly  half  the 
total  number  of  items.  Each  atom  or  ring-superatou  it  a  single  item. 

Each  part  has  to  satisfy  equation  7,  called  the  Restriction  on 
Univalents. 

Restriction  on  Univalents: 

n 

a.  <_[E  (i-2)o.]  -  1  0) 

'  i=2  ' 

i  =  valence. 

a.  =  number  of  atoms  or  superatoms  of  valence  i. 
n'  =  maximum  valence  in  composition. 


There  are  two  ways  of  partition  j  the  four  items  into  two  parts  (Fig.  3).  The 
restriction  on  univalents  is  satij Ted  in  each  case.  The  restriction  will  disallow 
certain  partitions  that  have  "tc  j  many"  26  univalents  other  than  hydrogens  and 
therefore  is  essential  only  in  partitioning  compositions  that  contain  any  number 
of  non-hydrogen  univalents. 


Step  2.  Generate  Radicals  from  Each  Part. 

Using  a  procedure  described  in  Section  D3,  radicals  are  generated  from  each  part 
in  each  partition.  The  result  of  application  of  this  procedure  to  the  example  is 
diown  in  Table  VIII. 


(26)  The  form  of  equation  7  results  from  the  fact  that  the  number  o'  univalents  (a.) 
cannot  exceed  the  number  of  free  valences  necessary  to  connect  the 
superatoms,  leaving  one  valence  free  for  the  radical  valence. 
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Table  VIII.  Radicals  Generated  from  Given  Parts 


Part 

1 

Radicals 

(la)  -C=C-  ,  >C  =  C< 

-♦ 

-c  =  c-ch=ch2 

-CH=CH-C=CH 

-C-C=CH 

II 

II 

CH2 

(lb)  C2 

-♦ 

-CH2-CH3 

(2a)  -C=C-  ,  C 

-♦ 

-CSC-CH3 

-CH2-CSCH 

(2b)  >C=C<  ,  C 

-♦ 

-CH=CH-CH 

3 

-C-CH, 

ii  3 

CH2 

-ch2-ch=ch2 

Step  3.  Form  Molecules  From  Radicals. 


The  radicals  are  combined  in  unique  pairs,  uithin  each  initial 
partition.  E3ch  pair  gives  rise  to  a  unique  molecule,  for  each  of 
which  the  centroid  is  a  bond.  There  are  nine  such  molecules  for  the 
example  chosen  (Fig.  3). 
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Category  B.  ATOM  CENTROID  (see  Fig.  4). 


Step  1 .  Selection  of  Centroid. 

One  must  consider  every  unique  atom  or  ring-superatom  that  has  a  free 
valence  of  three  or  higher  as  an  atom  centroid  * ,  , .  In  the  example, 

of  three  candidates  available:  -C^C-  ,  )  C=C(  and  C,  the  first  is  not 
chosen  for  it  has  a  free  valence  of  only  two. 

Step  2. _ Partition  the  Rest  of  the  Atoms. 

The  atom  or  ring-superatom  chosen  for  the  centroid  is  removed  from  the  set 
and  the  rest  are  partitioned  into  a  number  of  parts  less  than  or  equal  to  the 
valence  of  the  central  atom.  Each  part  must  have  less  than  half  the 
total  number  of  items  being  partitioned  (again  a  ring-superatom  is  a 
single  item).  Each  part  must  satisfy  the  restriction  on  univalents  (equation  7). 

Thus,  for  the  case  where  a  carbon  is  the  centroid,  four  partitions  are 
attempted.  The  condition  that  each  part  has  less  than  or  equal  to  one-half 
the  number  of  superatoms  remaining  after  selection  of  the  central  atom  must 
be  satisfied,  or  at  most  one  for  this  example.  There  is  exactly  one 
partition  for  three  parts,  i.e.,  one  in  each.  The  partitions  are  shown  in 
Figure  4. 

Step  3.  Generate  Radicals. 

Once  again,  using  the  procedure  described  in  Section  D3,  radicals  are 
constructed  for  each  part  in  each  partition.  For  example,  the  partition 
-C*C-  gives  rise  to  exactly  one  possible  radical  -CsCH  (Fig.  4). 

Step  4.  Combine  Radicals. 

Although  in  the  example  shown  every  part  generates  only  one  radical,  in  the 
general  case  there  will  be  many  radicals  for  each  part.  If  so,  the  radicals 
must  be  combined  to  give  all  unique  combinations  of  radicals  within  each  part. 
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Step  5. _ Form  Molecules  from  Central  Atom  and  Radicals. 

If  the  centroid  is  not  a  ring-superatom  but  is  a  simple  atom,  then  each 
combination  of  radicals  derived  in  Step  4  defines  a  single  molecule  that  is 
unique.  Thus  for  example  when  C  is  chosen  as  the  centroid,  step  4  gives  one 
combination  of  radicals  which  determines  a  single  molecule  when  connected 
to  the  central  C  (see  Figure  4). 

If  the  centroid  is  a  ring-superatom  and  the  valences  of  the  ring-superatom 
are  not  identical  then  different  ways  of  distributing  the  radicals  around  the 
center  may  yield  different  molecules.  Labelling  of  the  free  valences  of  the 
central  ring-superatom  with  radicals  treated  as  labels  (supplemented  with 
adequate  number  of  hydrogens  to  moke  up  the  total  free  valence  of  the  ring- 
superatom)  generates  a  complete  and  irredundant  list  of  molecules.  Thus 
)  C==C(  is  labelled  with  the  label  set: 

one  of  -C— CH,  two  of  -CH^,  and  one  of  -H. 

There  are  two  unique  labellings  as  shown  in  Figure  4. 

D2. _  Method  for  odd  number  of  total  atoms. 


With  an  odd  number  of  total  atoms,  n<-  structures  can  be  generated  with  a  bond 
centroid.  Only  atom  centroids  are  possible  l0'ZS  .  However,  it  is 
possible  for  structures  to  be  built  with  a  bivalent  atom  at  the  centroid.  Thus 
the  procedure  outlined  in  Category  B  above  is  followed,  in  this  case  also 
allowing  a  bivalent  atom  as  the  centroid. 

D3. _ Generation  of  Radicals. 

The  goal  of  this  procedure  is  to  generate  all  radicals  from  a  list  of 
atoms  and  ring-superatoms.  A  radical  is  defined  to  be  an  atom  or 
superatom  with  a  single  free  valence.  When  a  composition  of  atoms  and 
ring-superatoms  is  presented,  from  which  radicals  are  to  be  constructed,  two 
special  cases  are  recognized. 

Special  Case  1 .  Only  One  Atom  in  List  of  Atoms. 

When  only  one  atom  which  is  not  a  ring-superatom  is  in  the  list,  only  one 
radical  is  possible.  For  example,  with  one  C,  the  radical  -CH^  is  the 
only  possibility. 
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Special  Case  2.  Only  One  Rirtg -superatom  in  List  of  Ring-superatoms. 

In  this  casa,  depending  upon  the  symmetry  of  the  ring-superatom,  several 
radicals  may  be  possible.  This  is  determined  by  labelling  the  free  valences 
of  the  ring-superatom  with  one  label  of  a  special  type,  a  "radical -valence  . 

Example:  A  list  of  ring-superatoms  consists  of  one  ring-superatom,  ,1$. 


IS 


Two  radicals  result  from  labelling  with  one  radical  valence. 


CH2 

X 


17  18 

General  Case 

Radicals  have  uniquely  defined  centroids  as  well  •  The  centroid  is 

always  an  atom  of  valence  two  or  higher.  The  steps  for  construction  of 
radicals  are  as  follows. 


Step  I.  Selection  of  Atom  Centroid. 

Any  bivalent  or  higher  valent  atom  or  ring-superatom  is  a  valid  candidate  to 
be  the  centroid  of  a  radical.  Thus,  for  example,  for  the  composition 
-CaC-,  >C=C<  (see  part  la  in  Figure  3)  both  are  valid  centroids  (Figure  5). 
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Step  2.  Partition  the  Rest  of  the  Atoms. 

The  atom  chosen  for  the  centroid  is  removed  from  the  list  of  superatoms.  One 
of  the  valences  of  the  centroid  is  to  remain  free  (the  radical  valence). 

Therefore,  the  rest  of  the  atoms  in  the  list  are  partitioned  into  less  than  or 
equal  to  (valence  of  centroid  -  1)  parts.  Of  course,  each  part  should 
satisfy  the  restriction  on  univalents  (equation  7)  but  for  constructing 
radicals  there  is  no  restriction  on  the  size  of  tfie  parts. 

Step  3.  Form  Radicals  from  Each  Part. 

The  procedure  to  construct  radicals  is  freshly  invoked  on  each  part  thus 
generating  radicals.  Each  part  in  Figure  5  gives  rise  to  only  one  radical,  each 
arising  from  special  case  2. 

Step  4.  Combine  Radicals  in  Each  Part. 

For  the  example  in  Figure  5,  each  part  yields  only  one  radical.  In  a  more 
general  situation,  where  the  rest  of  the  list  of  superatoms  after  selection  of  a 
centroid  Is  partitioned  into  several  parts,  and  where  each  part  yields 
several  radicals,  the  radicals  are  combined  to  determine  all  unique  combinations 
of  radicals. 

Step  5.  Label  Central  Atom  with  Radicals. 

If  the  center  is  an  atom  (not  a  ring-superatom)  then  each  unique  combination 
defines  a  single  unique  molecule. 

If  the  center  is  a  ring-superatom,  the  radicals  are  determined  by  labelling  the 
center  with  a  set  of  labels  which  includes:  i)  the  radicals;  ii)  a  leading 
radical-valence;  iii)  an  adequate  number  of  hydrogens  to  make  up  the 
remaining  free  valences  of  the  ring-superatom.  One  selection  of  center  gives 
one  radical  and  the  other  gives  two  more,  to  complete  a  list  of  three 
radicals  for  the  example  chosen  (Fig.  5). 

Summary 

For  the  example  chosen  to  illustrate  the  operation  of  the  acyclic  generator, 
twelve  isomers  are  generated,  nine  shown  in  Figure  3  ond  three  shown  in 
Figure  4. 
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FIGURE  CAPTIONS 


Figure  1.  Outline  of  the  strategy  for  structure  generation. 

Figure  2.  Major  steps  in  the  generation  of  isomers  as  illustrated  for 
C^Hg.  examP^e  ou^'nes  *^e  method  for  one 
superatom  partition,  that  which  allocates  all  atoms  to 
a  single  superatompot  with  no  atoms  in  the  remaining  pot. 

Figure  3.  Operation  of  the  acyclic  generator  for  the  case  of  a  bond 
as  a  centroid  for  the  structures. 

Figure  4.  Operation  of  the  acyclic  generator  for  the  case  of  an 
atom  or  superatom  as  a  centroid  for  the  structures. 

Figure  5.  Outline  of  the  method  for  generation  of  radicals  which 

are  eventually  combined  by  the  acyclic  generator  to  yield 
final  structures. 
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Empirical  Formula 
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